Search Results: "julian"

31 October 2016

Chris Lamb: Free software activities in October 2016

Here is my monthly update covering what I have been doing in the free software world (previously):

Made a large number of improvements to travis.debian.net, my hosted service for projects that host their Debian packaging on GitHub to use the Travis CI continuous integration platform to test builds on every code change:
- Enabled the use of Git submodules. Thanks to @unera & @hosiet. (#30)
- Managed a contribution from @xhaakon to allow adding an extra repository for custom dependencies. (#17)
- Fixed an issue where builds did not work under Debian Wheezy or Ubuntu Trusty due to a call to dpkg-buildpackage --show-field. (#28)
- Fixed an issue where TRAVIS_DEBIAN_EXTRA_REPOSITORY was accidentally required. (#27)
- Made a number of miscellaneous cosmetic improvements. (f7e5b080 & 037de91cc, etc.)
Submitted a pull request to Alabaster, the default theme for the Python Sphinx documentation system, to ensure that "extra navigation links" are rendered reproducibly. (#90)
Improved my Chrome extension for the FastMail web interface:
- Managed a pull request from @jlerner to add an optional confirmation dialogue before sending any message. (#10)
- Added an optional Ctrl+Enter alias for Alt+Enter to limit searches to the current folder; the latter shortcut is already mapped by my window manager. (d691b07)
- Various cosmetic changes to the options page. (7b95e887 & 833ff0fe)
Submitted two pull requests to mypy, an experimental static type checker for Python:
- Ensure that the output of --usage is reproducible. (#2234)
- Update the --usage output to match the now-reproducible output. (#2235)
Updated django-slack, my library to easily post messages to the Slack group-messaging utility:
- Merged a feature from @lvpython to add an option to post the message as the authenticated user rather than the specified one. (#59)
- Merged a documentation update from @ataylor32 regarding the new method of generating access tokens. (#58)
Made a number of cosmetic improvements to AptFs, my FUSE-based filesystem that provides a view on unpacked Debian source packages as regular folders.
Updated the SSL certificate for try.diffoscope.org, a hosted version of the diffoscope in-depth and content-aware diff utility. Continued thanks to Bytemark for sponsoring the hardware.

Debian & Reproducible builds

Whilst anyone can inspect the source code of free software for malicious flaws, most GNU/Linux distributions provide binary (or "compiled") packages to end users. The motivation behind the Reproducible Builds effort is to allow verification that no flaws have been introduced either maliciously and accidentally during this compilation process by promising identical binary packages are always generated from a given source.

Presented a talk entitled "Reproducible Builds" talk at Software Freedom Kosova, in Prishtina, Republic of Kosovo.
I filed my 2,500th bug in the Debian BTS: #840972: golang-google-appengine: accesses the internet during build.
In order to build packages reproducibly, one not only needs identical sources but also some external and sharable definition of the environment used for a particular build, stipulating such things such as the version numbers of the required build-dependencies. It is not currently clear how to handle these .buildinfo files after the archive software has processed them and how to make them available to the world so I started development on a proof-of-concept server to see what issues arise in practice. It is available at buildinfo.debian.net.
Chaired an IRC meeting and ran a poll to determine a regular time .
Submitted two design proposals to our wiki page.
Improvements to our tests.reproducible-builds.org testing framework:
- Move regular "Scheduled in..." messages to the #debian-reproducible-changes IRC channel.
- Use our log_info method instead of manual echo calls.
- Correct an "all sources packages" "all source packages" typo.
- Submit .buildinfo files to buildinfo.debian.net.
- Create GPG key on nodes for buildinfo.debian.net at deploy time, not "lazily".

My work in the Reproducible Builds project was also covered in our weekly reports. (#75, #76, #77 & #78).

I also submitted 14 patches to fix specific reproducibility issues in bio-eagle, cf-python, fastx-toolkit, fpga-icestorm, http-icons, lambda-align, mypy, playitslowly, seabios, stumpwm, sympa, tj3, wims-help & xotcl.

Debian LTS

This month I have been paid to work 13 hours on Debian Long Term Support (LTS). In that time I did the following:

Seven days of "frontdesk" duties, triaging CVEs, etc.
Issued DLA 647-1 for freeimage correcting an out-of-bounds write vulnerability in the XMP image handling functionality.
Issued DLA 649-1 for python-django fixing a possible CSRF protection bypass on sites that use Google Analytics.
Issued DLA 654-1 for libxfixes preventing an integer overflow when a malicious client sent INT_MAX as a "length".
Issued DLA 662-1 for quagga correcting a programming error where two constants were confused that could cause stack overrun in IPv6 routing code.
Issued DLA 688-1 for cairo to prevent a DoS attack where a malicious SVG could generate invalid pointers.

Patches contributed

python-debian: Missing chardet from setup.py depends
lintian: Drop double spaces in includes-maintscript paragraph
db5.3: maintscript includes "maint-script parameters"
dracut: maintscript includes "maint-script parameters"

Uploads

gunicorn:
- 19.6.0-7 Set supplementary groups when changing uid, add an example systemd .service file to gunicorn-examples, and expand README.Debian to make it clearer what to do now that /etc/gunicorn.d has been removed.
- 19.6.0-8 Correct previous supplementary groups patch to be compatible with Python 3.
redis:
- 3:3.2.4-2 Ensure that sentinel's configuration actually writes to a pidfile location so that systemd can detect that the daemon has started.
- 3:3.2.5-1 New upstream release.
libfiu:
- 0.94-8 Fix FTBFS under Bash due to lack of && in debian/rules.
- 0.94-9 Ensure the build is reproducible by sorting injected modules.
aptfs (2:0.8-2) Minor cosmetic changes.

Antoine Beaupr : My free software activities, October 2016

Debian Long Term Support (LTS) This is my 7th month working on Debian LTS, started by Raphael Hertzog at Freexian, after a long pause during the summer. I have worked on the following packages and CVEs:

tre: CVE-2016-8859

graphicsmagick: CVE-2016-7448, CVE-2016-7996, CVE-2016-7997, CVE-2016-8682, CVE-2016-8683, CVE-2016-8684

tar: CVE-2016-6321

I have also helped review work on the following packages:

imagemagick: reviewed BenH's work to figure out what was done. unfortunately, I forgot to officially take on the package and Roberto started working on it in the meantime. I nevertheless took time to review Roberto's work and outline possible issues with the original patchset suggested

tiff: reviewed Raphael's work on the hairy `TIFFTAG_*` issues, all the gory details in this email

The work on ImageMagick and GraphicsMagick was particularly intriguing. Looking at the source of those programs makes me wonder why were are still using them at all: it's a tangled mess of C code that is bound to bring up more and more vulnerabilities, time after time. It seems there's always an "Magick" vulnerability waiting to be fixed out there... I somehow hoped that the fork would bring more stability and reliability, but it seems they are suffering from similar issues because, fundamentally, they haven't rewritten ImageMagick... It looks this is something that affects all image programs. The review I have done on the tiff suite give me the same shivering sensation as reviewing the "Magick" code. It feels like all image libraries are poorly implemented and then bound to be exploited somehow... Nevertheless, if I had to use a library of the sort in my software, I would stay away from the "Magick" forks and try something like imlib2 first... Finally, I also did some minor work on the user and developer LTS documentation and some triage work on samba, xen and libass. I also looked at the dreaded CVE-2016-7117 vulnerability in the Linux kernel to verify its impact on wheezy users. I also looked at implementing a `--lts` flag for `dch` (see bug #762715). It was difficult to get back to work after such a long pause, but I am happy I was able to contribute a significant number of hours. It's a bit difficult to find work sometimes in LTS-land, even if there's actually always a lot of work to be done. For example, I used to be one of the people doing frontdesk work, but those duties are now assigned until the end of the year, so it's unlikely I will be doing any of that for the forseable future. Similarly, a lot of packages were assigned when I started looking at the available packages. There was an interesting discussion on the internal mailing list regarding unlocking package ownership, because some people had packages locked for weeks, sometimes months, without significant activity. Hopefully that situation will improve after that discussion. Another interesting discussion I participated in is the question of whether the LTS team should be waiting for unstable to be fixed before publishing fixes in oldstable. It seems the consensus right now is that it shouldn't be mandatory to fix issues in unstable before we fix security isssues in oldstable and stable. After all, security support for testing and unstable is limited. But I was happy to learn that working on brand new patches is part of our mandate as part of the LTS work. I did work on such a patch for tar which ended up being adopted by the original reporter, although upstream ended up implementing our recommendation in a better way. It's coincidentally the first time since I start working on LTS that I didn't get the number of requested hours, which means that there are more people working on LTS. That is a good thing, but I am worried it may also mean people are more spread out and less capable of focusing for longer periods of time on more difficult problems. It also means that the team is growing faster than the funding, which is unfortunate: now is a good time as any to remind you to see if you can make your company fund the LTS project if you are still running Debian wheezy.

Other free software work It seems like forever that I did such a report, and while I was on vacation, a lot has happened since the last report.

Monkeysign I have done extensive work on Monkeysign, trying to bring it kicking and screaming in the new world of GnuPG 2.1. This was the objective of the 2.1 release, which collected about two years of work and patches, including arbitrary MUA support (e.g. Thunderbird), config files support, and a release on PyPI. I have had to release about 4 more releases to try and fix the build chain, ship the test suite with the program and have a primitive preferences panel. The 2.2 release also finally features Tor suport! I am also happy to have moved more documentation to Read the docs, part of which I mentionned in in a previous article. The git repositories and issues were also moved to a Gitlab instance which will hopefully improve the collaboration workflow, although we still have issues in streamlining the merge request workflow. All in all, I am happy to be working on Monkeysign, but it has been a frustrating experience. In the last years, I have been maintaining the project largely on my own: although there are about 20 contributors in Monkeysign, I have committed over 90% of the commits in the code. New contributors recently showed up, and I hope this will release some pressure on me being the sole maintainer, but I am not sure how viable the project is.

Funding free software work More and more, I wonder how to sustain my contributions to free software. As a previous article has shown, I work a lot on the computer, even when I am not on a full-time job. Monkeysign has been a significant time drain in the last months, and I have done this work on a completely volunteer basis. I wouldn't mind so much except that there is a lot of work I do on a volunteer basis. This means that I sometimes must prioritize paid consulting work, at the expense of those volunteer projects. While most of my paid work usually revolves around free sofware, the benefits of paid work are not always immediately obvious, as the primary objective is to deliver to the customer, and the community as a whole is somewhat of a side-effect. I have watched with interest joeyh's adventures into crowdfunding which seems to be working pretty well for him. Unfortunately, I cannot claim the incredible (and well-deserved) reputation Joey has, and even if I could, I can't live with 500$ a month. I would love to hear if people would be interested in funding my work in such a way. I am hesitant in launching a crowdfunding campaign because it is difficult to identify what exactly I am working on from one month to the next. Looking back at earlier reports shows that I am all over the place: one month I'll work on a Perl Wiki (Ikiwiki), the next one I'll be hacking at a multimedia home cinema (Kodi). I can hardly think of how to fund those things short of "just give me money to work on anything I feel like", which I can hardly ask for of anyone. Even worse, it feels like the audience here is either friends or colleagues. It would make little sense for me to seek funding from those people: colleagues have the same funding problems I do, and I don't want to empoverish my friends... So far I have taken the approach of trying to get funding for work I am doing, bit by bit. For example, I have recently been told that LWN actually pays for contributed articles and have started running articles by them before publishing them here. This is looking good: they will publish an article I wrote about the Omnia router I have recently received. I give them exclusive rights on the article for two weeks, but I otherwise retain full ownership over the article and will publish them after the exclusive period here. Hopefully, I will be able to find more such projects that pays for the work I do on a day to day basis.

Open Street Map editing I have ramped up my OpenStreetMap contributions, having (temporarily) moved to a different location. There are lots of things to map here: trails, gaz stations and lots of other things are missing from the map. Sometimes the effort looks a bit ridiculous, reminding me of my early days of editing OSM. I have registered to OSM Live, a project to fund OSM editors that, I must admit, doesn't help much with funding my work: with the hundreds of edits I did in October, I received the equivalent of 1.80$CAD in Bitcoins. This may be the lowest hourly salary I have ever received, probably going at a rate of 10 per hour! Still, it's interesting to be able to point people to the project if someone wants to contribute to OSM mappers. But mappers should have no illusions about getting a decent salary from this effort, I am sorry to say.

Bounties I feel this is similar to the "bounty" model used by the Borg project: I claimed around $80USD in that project for what probably amounts to tens of hours of work, yet another salary that would qualify as "poor". Another example is a feature I would like to implement in Borg: support for protocols other than SSH. There is currently no bounty on this, but a similar feature, S3 support has one of the largest bounties Borg has ever seen: $225USD. And the claimant for the bounty hasn't actually implemented the feature, instead backing up to S3, the patch (to a third-party tool) actually enables support for Amazon Cloud Drive, a completely different API. Even at $225, I wouldn't be able to complete any of those features and get a decent salary. As well explained by the Snowdrift reviews, bounties just don't work at all... The ludicrous 10% fee charged by Bountysource made sure I would never do business with them ever again anyways.

Other work

fixed a bug in python's setuptools_scm that was a blocker for Monkeysign's use of the package to avoid duplicating version numbers everywhere...

tried to figure out how activitypub was going to work with existing social networking software (TL;DR: it won't)

improved the sicherboot README so that others that come after me will better understand how that secure boot system works

more charybdis IRCd work: SIGABRT on jessie and another segfault, still more issues pending here, even

arbitrary source URL support for the Sphinx Alabaster theme

a bunch of issues on unattended-upgrades: raspbian configuration fails and point releases upgrades

There are probably more things I did recently, but I am having difficulty keeping track of the last 5 months of on and off work, so you will forgive that I am not as exhaustive as I usually am.

25 October 2016

Julian Andres Klode: Introducing DNS66, a host blocker for Android

I m proud (yes, really) to announce DNS66, my host/ad blocker for Android 5.0 and newer. It s been around since last Thursday on F-Droid, but it never really got a formal announcement. DNS66 creates a local VPN service on your Android device, and diverts all DNS traffic to it, possibly adding new DNS servers you can configure in its UI. It can use hosts files for blocking whole sets of hosts or you can just give it a domain name to block (or multiple hosts files/hosts). You can also whitelist individual hosts or entire files by adding them to the end of the list. When a host name is looked up, the query goes to the VPN which looks at the packet and responds with NXDOMAIN (non-existing domain) for hosts that are blocked. You can find DNS66 here:

on GitHub: https://github.com/julian-klode/dns66
on F-Droid: https://f-droid.org/app/org.jak_linux.dns66

F-Droid is the recommended source to install from. DNS66 is licensed under the GNU GPL 3, or (mostly) any later version. Implementation Notes DNS66 s core logic is based on another project, dbrodie/AdBuster, which arguably has the cooler name. I translated that from Kotlin to Java, and cleaned up the implementation a bit: All work is done in a single thread by using poll() to detect when to read/write stuff. Each DNS request is sent via a new UDP socket, and poll() polls over all UDP sockets, a Device Socket (for the VPN s tun device) and a pipe (so we can interrupt the poll at any time by closing the pipe). We literally redirect your DNS servers. Meaning if your DNS server is 1.2.3.4, all traffic to 1.2.3.4 is routed to the VPN. The VPN only understands DNS traffic, though, so you might have trouble if your DNS server also happens to serve something else. I plan to change that at some point to emulate multiple DNS servers with fake IPs, but this was a first step to get it working with fallback: Android can now transparently fallback to other DNS servers without having to be aware that they are routed via the VPN. We also need to deal with timing out queries that we received no answer for: DNS66 stores the query into a LinkedHashMap and overrides the removeEldestEntry() method to remove the eldest entry if it is older than 10 seconds or there are more than 1024 pending queries. This means that it only times out up to one request per new request, but it eventually cleans up fine.
Filed under: Android, Uncategorized

8 October 2016

Norbert Preining: Debian/TeX update October 2016: all of TeX Live and Biber 2.6

Finally a new update of many TeX related packages: all the texlive-* including the binary packages, and biber have been updated to the latest release. This upload was delayed by my travels around the world, as well as the necessity to package a new Perl module (libdatetime-calendar-julian-perl) as required by new Biber. Also, my new job leaves me only the weekends for packaging. Anyway, the packages are now uploaded and should appear soon on your friendly local server. texlive2016-debian

There are several highlights: The binaries have been patched with several upstream fixes (tex4ht and XeTeX compatibility, as well as various Japanese TeX engine fixes), updated biber and biblatex, and as usual loads of new and updated packages. Last but not least I want to thank one particular author: His package was removed from TeX Live due to the addition of a rather unusual clause in the license. Instead of simply uploading new packages to Debian with the rather important removed, I contacted the author and asked for clarification. And to my great pleasure he immediately answered with an update of the package with fixed license. All of us user of these many packages should be grateful to the authors of the packages who invest loads of their free time into supporting our community. Thanks! Enough now, here as usual the list of new and updated packages with links to their respective CTAN pages. Enjoy. New packages addfont, apalike-german, autoaligne, baekmuk, beamerswitch, beamertheme-cuerna, beuron, biblatex-claves, biolett-bst, cooking-units, cstypo, emf, eulerpx, filecontentsdef, frederika2016, grant, latexgit, listofitems, overlays, phonenumbers, pst-arrow, quicktype, revquantum, richtext, semantic-markup, spalign, texproposal, tikz-page, unfonts-core, unfonts-extra, uspace. Updated packages achemso, acmart, acro, adobemapping, alegreya, allrunes, animate, arabluatex, archaeologie, asymptote, attachfile, babel-greek, bangorcsthesis, beebe, biblatex, biblatex-anonymous, biblatex-apa, biblatex-bookinother, biblatex-chem, biblatex-fiwi, biblatex-gost, biblatex-ieee, biblatex-manuscripts-philology, biblatex-morenames, biblatex-nature, biblatex-opcit-booktitle, biblatex-phys, biblatex-realauthor, biblatex-science, biblatex-true-citepages-omit, bibleref, bidi, chemformula, circuitikz, cochineal, colorspace, comment, covington, cquthesis, ctex, drawmatrix, ejpecp, erewhon, etoc, exsheets, fancyhdr, fei, fithesis, footnotehyper, fvextra, geschichtsfrkl, gnuplottex, gost, gregoriotex, hausarbeit-jura, ijsra, ipaex, jfontmaps, jsclasses, jslectureplanner, latexdiff, leadsheets, libertinust1math, luatexja, markdown, mcf2graph, minutes, multirow, mynsfc, nameauth, newpx, newtxsf, notespages, optidef, pas-cours, platex, prftree, pst-bezier, pst-circ, pst-eucl, pst-optic, pstricks, pstricks-add, refenums, reledmac, rsc, shdoc, siunitx, stackengine, tabstackengine, tagpair, tetex, texlive-es, texlive-scripts, ticket, translation-biblatex-de, tudscr, turabian-formatting, updmap-map, uplatex, xebaposter, xecjk, xepersian, xpinyin. Enjoy.

25 September 2016

Julian Andres Klode: Introducing TrieHash, a order-preserving minimal perfect hash function generator for C(++)

Abstract I introduce TrieHash an algorithm for constructing perfect hash functions from tries. The generated hash functions are pure C code, minimal, order-preserving and outperform existing alternatives. Together with the generated header files,they can also be used as a generic string to enumeration mapper (enums are created by the tool). Introduction APT (and dpkg) spend a lot of time in parsing various files, especially Packages files. APT currently uses a function called AlphaHash which hashes the last 8 bytes of a word in a case-insensitive manner to hash fields in those files (dpkg just compares strings in an array of structs). There is one obvious drawback to using a normal hash function: When we want to access the data in the hash table, we have to hash the key again, causing us to hash every accessed key at least twice. It turned out that this affects something like 5 to 10% of the cache generation performance. Enter perfect hash functions: A perfect hash function matches a set of words to constant values without collisions. You can thus just use the index to index into your hash table directly, and do not have to hash again (if you generate the function at compile time and store key constants) or handle collision resolution. As #debian-apt people know, I happened to play a bit around with tries this week before guillem suggested perfect hashing. Let me tell you one thing: My trie implementation was very naive, that did not really improve things a lot Enter TrieHash Now, how is this related to hashing? The answer is simple: I wrote a perfect hash function generator that is based on tries. You give it a list of words, it puts them in a trie, and generates C code out of it, using recursive switch statements (see code generation below). The function achieves competitive performance with other hash functions, it even usually outperforms them. Given a dictionary, it generates an enumeration (a C enum or C++ enum class) of all words in the dictionary, with the values corresponding to the order in the dictionary (the order-preserving property), and a function mapping strings to members of that enumeration. By default, the first word is considered to be 0 and each word increases a counter by one (that is, it generates a minimal hash function). You can tweak that however:

= 0
WordLabel ~ Word
OtherWord = 9

will return 0 for an unknown value, map Word to the enum member WordLabel and map OtherWord to 9. That is, the input list functions like the body of a C enumeration. If no label is specified for a word, it will be generated from the word. For more details see the documentation C code generation

switch(string[0]   32)  
case 't':
    switch(string[1]   32)  
    case 'a':
        switch(string[2]   32)  
        case 'g':
            return Tag;
         
     
 
return Unknown;

Yes, really recursive switches they directly represent the trie. Now, we did not really do a straightforward translation, there are some optimisations to make the whole thing faster and easier to look at: First of all, the 32 you see is used to make the check case insensitive in case all cases of the switch body are alphabetical characters. If there are non-alphabetical characters, it will generate two cases per character, one upper case and one lowercase (with one break in it). I did not know that lowercase and uppercase characters differed by only one bit before, thanks to the clang compiler for pointing that out in its generated assembler code! Secondly, we only insert breaks only between cases. Initially, each case ended with a return Unknown, but guillem (the dpkg developer) suggested it might be faster to let them fallthrough where possible. Turns out it was not faster on a good compiler, but it s still more readable anywhere. Finally, we build one trie per word length, and switch by the word length first. Like the 32 trick, his gives a huge improvement in performance. Digging into the assembler code The whole code translates to roughly 4 instructions per byte:

A memory load,
an or with 32
a comparison, and
a conditional jump.

(On x86, the case sensitive version actually only has a cmp-with-memory and a conditional jump). Due to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77729 this may be one instruction more: On some architectures an unneeded zero-extend-byte instruction is inserted this causes a 20% performance loss. Performance evaluation I run the hash against all 82 words understood by APT in Packages and Sources files, 1,000,000 times for each word, and summed up the average run-time:

host	arch	Trie	TrieCase	GPerfCase	GPerf	DJB
plummer	ppc64el	540	601	1914	2000	1345
eller	mipsel	4728	5255	12018	7837	4087
asachi	arm64	1000	1603	4333	2401	1625
asachi	armhf	1230	1350	5593	5002	1784
barriere	amd64	689	950	3218	1982	1776
x230	amd64	465	504	1200	837	693

Suffice to say, GPerf does not really come close. All hosts except the x230 are Debian porterboxes. The x230 is my laptop with a a Core i5-3320M, barriere has an Opteron 23xx. I included the DJB hash function for another reference. Source code The generator is written in Perl, licensed under the MIT license and available from https://github.com/julian-klode/triehash I initially prototyped it in Python, but guillem complained that this would add new build dependencies to dpkg, so I rewrote it in Perl. Benchmark is available from https://github.com/julian-klode/hashbench Usage See the script for POD documentation.
Filed under: General

7 September 2016

Julian Andres Klode: New software: sicherboot

Today, I wrote sicherboot, a tool to integrate systemd-boot into a Linux distribution in an entirely new way: With secure boot support. To be precise: The use case here is to only run trusted code which then unmounts an otherwise fully encrypted disk, as in my setup: screenshot-from-2016-09-06-04-09-52

If you want, sicherboot automatically creates db, KEK, and PK keys, and puts the public keys on your EFI System Partition (ESP) together with the KeyTool tool, so you can enroll the keys in UEFI. You can of course also use other keys, you just need to drop a db.crt and a db.key file into /etc/sicherboot/keys. It would be nice if sicherboot could enroll the keys directly in Linux, but there seems to be a bug in efitools preventing that at the moment. For some background: The Platform Key (PK) signs the Key Exchange Key (KEK) which signs the database key (db). The db key is the one signing binaries. sicherboot also handles installing new kernels to your ESP. For this, it combines the kernel with its initramfs into one executable UEFI image, and then signs that. Combined with a fully encrypted disk setup, this assures that only you can run UEFI binaries on the system, and attackers cannot boot any other operating system or modify parts of your operating system (except for, well, any block of your encrypted data, as XTS does not authenticate the data; but then you do have to know which blocks are which which is somewhat hard). sicherboot integrates with various parts of Debian: It can work together by dracut via an evil hack (diverting dracut s kernel/postinst.d config file, so we can run sicherboot after running dracut), it should support initramfs-tools (untested), and it also integrates with systemd upgrades via triggers on the /usr/lib/systemd/boot/efi directory. Currently sicherboot only supports Debian-style setups with /boot/vmlinuz-<version> and /boot/initrd.img-<version> files, it cannot automatically create combined boot images from or install boot loader entries for other naming schemes yet. Fixing that should be trivial though, with a configuration setting and some eval magic (or string substitution). Future planned features include: (1) support for multiple ESP partitions, so you can have a fallback partition on a different drive (think RAID type situation, keep one ESP on each drive, so you can remove a failing one); and (2) a tool to create a self-contained rescue disk image from a directory (which will act as initramfs) and a kernel (falling back to a vmlinuz file ) It might also be interesting to add support for other bootloaders and setups, so you could automatically sign a grub cryptodisk image for example. Not sure how much sense that makes. I published the source at https://github.com/julian-klode/sicherboot (MIT licensed) and uploaded the package to Debian, it should enter the NEW queue soon (or be in NEW by the time you read this). Give it a try, and let me know what you think.
Filed under: Debian, sicherboot

2 September 2016

Julian Andres Klode: apt 1.3 RC4 Tweaking apt update

Did that ever happen to you: You run apt update, it fetches a Release file, then starts fetching DEP-11 metadata, then any pdiff index stuff, and then applies them; all after another? Or this: You don t see any update progress until very near the end? Worry no more: I tweaked things a bit in 1.3~rc4 (git commit). Prior to 1.3~rc4, acquiring the files for an update worked like this: We create some object for the Release file, once a release file is done we queue any next object (DEP-11 icons, .diff/Index files, etc). There is no prioritizing, so usually we fetch the 5MB+ DEP-11 icons and components files first, and only then start working on other indices which might use Pdiff. In 1.3~rc4 I changed the queues to be priority queues: Release files and .diff/Index files have the highest priority (once we have them all, we know how much to fetch). The second level of priority goes to the .pdiff files which are later on passed to the rred process to patch an existing Packages, Sources, or Contents file. The third priority level is taken by all other index targets. Actually, I implemented the priority queues back in Jun. There was just one tiny problem: Pipelining. We might be inserting elements into our fetching queues in order of priority, but with pipelining enabled, stuff of lower priority might already have their HTTP request sent before we even get to queue the higher priority stuff. Today I had an epiphany: We fill the pipeline up to a number of items (the depth, currently 10). So, let s just fill the pipeline with items that have the same (or higher) priority than the maximum priority of the already-queued ones; and pretend it is full when we only have lower priority items. And that works fine: First the Release and .diff/Index stuff is fetched, which means we can start showing accurate progress info from there one. Next, the pdiff files are fetched, meaning that we can apply them in parallel to any targets downloading later in parallel (think DEP-11 icon tarballs). This has a great effect on performance: For the 01 Sep 2016 03:35:23 UTC -> 02 Sep 2016 09:25:37 update of Debian unstable and testing with Contents and appstream for amd64 and i386, update time reduced from 37 seconds to 24-28 seconds. In other news I recently cleaned up the apt packaging which renamed /usr/share/bug/apt/script to /usr/share/bug/apt. That broke on overlayfs, because dpkg could not rename the old apt directory to a backup name during unpack (only directories purely on the upper layer can be renamed). I reverted that now, so all future updates should be fine. David re-added the Breaks against apt-utils I recently removed by accident during the cleanup, so no more errors about overriding dump solvers. He also added support for fingerprints in gpgv s GOODSIG output, which apparently might come at some point. I Also fixed a few CMake issues, fixed the test suite for gpgv 2.1.15, allow building with a system-wide gtest library (we really ought to add back a pre-built one in Debian), and modified debian/rules to pass -O to make. I wish debhelper would do the latter automatically (there s a bug for that). Finally, we fixed some uninitialized variables in the base256 code, out-of-bound reads in the Sources file parser, off-by-one errors in the tagfile comment stripping code[1], and some memcpy() with length 0. Most of these will be cherry-picked into the 1.2 (xenial) and 1.0.9.8 (jessie) branches (releases 1.2.15 and 1.0.9.8.4). If you forked off your version of apt at another point, you might want to do the same. [1] those were actually causing the failures and segfaults in the unit tests on hurd-i386 buildds. I always thought it was a hurd-specific issue PS. Building for Fedora on OBS has a weird socket fd #3 that does not get closed during the test suite despite us setting CLOEXEC on it. Join us in #debian-apt on oftc if you have ideas.
Filed under: Debian, Ubuntu

10 August 2016

Julian Andres Klode: Porting APT to CMake

Ever since it s creation back in the dark ages, APT shipped with it s own build system consisting of autoconf and a bunch of makefiles. In 2009, I felt like replacing that with something more standard, and because nobody really liked autotools, decided to go with CMake. Well, the bazaar branch was never really merged back in 2009. Fast forward 7 years to 2016. A few months ago, we noticed that our build system had trouble with correct dependencies in parallel building. So, in search for a way out, I picked up my CMake branch from 2009 last Thursday and spent the whole weekend working on it, and today I am happy to announce that I merged it into master:

123 files changed, 1674 insertions(+), 3205 deletions(-)

More than 1500 lines less build system code. Quite impressive, eh? This also includes about 200 lines of less code in debian/, as that switched from prehistoric debhelper stuff to modern dh (compat level 9, almost ready for 10). The annoying Tale of Targets vs Files Talking about CMake: I don t really love it. As you might know, CMake differentiates between targets and files. Targets can in some cases depend on files (generated by a command in the same directory), but overall files are not really targets. You also cannot have a target with the same name as a file you are generating in a custom command, you have to rename your target (make is OK with the generated stuff, but ninja complains about cycles because your custom target and your custom command have the same name). Byproducts for the (time) win One interesting thing about CMake and Ninja are byproducts. In our tree, we are building C++ files. We also have .pot templates depending on them, and .mo files depending on the templates (we have multiple domains, and merge the per-domain .pot with the all-domain .po file during the build to get a per-domain .mo). Now, if we just let them depend naively, changing a C++ file causes the .pot file to be regenerated which in turns causes us to build .mo files for every freaking language in the package. Even if nothing changed. Byproducts solve this problem. Instead of just building the .pot file, we also create a stamp file (AKA the witness) and write the .pot file (without a header) into a temporary name and only copy it to its final name if the content changed. The .pot file is declared as a byproduct of the command. The command doing the .pot->.mo step still depends on the .pot file (the byproduct), but as that only changes now if strings change, the .mo files only get rebuild if I change a translatable string. We still need to ensure that that the .pot file is actually built before we try to use it the solution here is to specify a custom target depending on the witness and then have the target containing the .mo build commands depend on that target. Now if you use make, you might now this trick already. In make, the byproducts remain undeclared, though, while in CMake we can now actually express them, and they are used by the Ninja generator and the Ninja build tool if you chose that over make (try it out, it s fast). Further Work Some command names are hardcoded, I should find_program() them. Also cross-building the package does not yet work successfully, but it only requires a tiny amount of patches in debhelper and/or cmake. I also tried building the package on a Fedora docker image (with dpkg installed, it s available in the Fedora sources). While I could eventually get the programs build and most of the integration test suite to pass, there are some minor issues to fix, mostly in the documentation building and GTest department: Fedora ships its docbook stylesheets in a different location, and ships GTest as a pre-compiled library, and not a source tree. I have not yet tested building on exotic platforms like macOS, or even a BSD. Please do and report back. In Debian, CMake is not up-to.date enough on the non-Linux platforms to build APT due to test suite failures, I hope those can be fixed/disabled soon (it appears to be a timing issue AFAICT). I hope that we eventually get some non-Debian backends for APT. I d love that.
Filed under: Debian, Uncategorized

1 August 2016

Shirish Agarwal: Doha and the past year in APT

A week has gone by and another small sharing about Doha and one package that quite a few of us use everyday but don t think much of them. Let s start with Doha with these two pictures which tells/shares a bit about what the Doha of today is like

While I have more than a dozen snapshots of Doha, all of them show same thing, all are huge skyscrapers and overall Doha seems to be aping Dubai and is in a frenzy as World Cup 2022 is around the corner. We did see a few of the older places but these seemed to be more done for tourists rather than the real thing. We saw stuff like

This was a picture taken by Ritesh Raj Saraff, a friend and a DD whom I met while I was going to Debconf. The place where this picture has been taken is known as a Souk or what we know as market-place. This was a place where you could get spices. Quite a few of the spices that we get and use in India were bought from Middle-East in the olden times. In fact, it has been argued that the whole Mughlai food that is part of Indian culture was imported from Middle-East when we were trading them before India or Akhand Bharat was invaded. What was interesting to both of us is that we could perceive that most of the buildings had a sort of fakeness to it, they tried to show that it had a lot of detailed work on the buildings but we could see it was all done recently so not that old as being led to believe. One of the other interesting bits that we came to know throughout our stay in Qatar that almost 80-90% of the staff we met inside Qatar airport as well as in the Souk were people from Asian sub-continent and more specifically from South India. I had few interesting conversations with some of the people who were managing the shops were that almost of them were just employees while the owners were Qataris . I could understand this as the distance and flight between Qatar and India is hardly 3 hours. It seemed very similar to how Mexicans look for work in United States. The most expensive thing there was water as it s desert other than housing and most workers seemed to have shared accommodation from anywhere between 5-15 people in one room. It s only the relative strength of the Qatari Rial which probably compels them to be there. The temperature was around 45 degrees with a bit of humidity as it s next to the Ocean. For all the money in the world, I wouldn t work there. It is true that you know your city s worth only when you go outside

I do have some more stories about Qatar but that would have to wait for another day now. Also, I really don t want to talk much about this part as it s part depressing but probably would explore it a bit in a further blog-post. One of the more interesting topics that I attended was the apt talk . There are 3-4 tools in the Debian world i.e. apt, aptitude, apt-get, dpkg and dselect. More often than not people know aptitude and apt-get whereas the rest of the packages are not thought so much about. What I somewhat suspected about the history of apt was revealed to be true today, courtesy David K.

You can see the talk/video about apt at http://meetings-archive.debian.net/pub/debian-meetings/2016/debconf16/The_past_year_in_APT.webm. I had been curious about apt,libapt,dpkg and the entire tool-chain which goes into updating packages and like. I had a couple of conversations here in India before on mail, in person and IRC as well as couple of conversations in South Africa as well before the APT talk where it was told that packages are not signed or it s not easy to figure out the integrity. Being a Debian fan-boy I could not believe this to be true. Hence I asked and to my dismay found it to be true . I also then asked the same with a bit more background on the mailing list as well and got to know that this has been a concern since 2005. As I do not have the requisite skills and the person would require probably knowledge of dpkg internals as well as have probably good social skills to have at least 1-2 DD s help her/im to work on it and have probably some server space where even some partial archive is re-built using debian packages which use dpkg-sig . I also had some concerns that even if somebody did do the work, it might come in the way of the reproducible builds concept where Neils shared ways in which it could be overcome. Having said the above, it is totally doable if somebody has the will, skills and the patience to do it. Just look at the amazing work done by the team which re-built almost all the archive using clang. See clang.debian.net for the amazing work that they have done. Now, one of the issues in India which comes in popularizing Debian or in fact any free software distribution in India is the bandwidth issue or rather the lack of it or how expensive it is. The situation for better lack of term is pathetic . While nothing can be done till the time the Govt. gives limited term oligopoly licenses to telecom operators and they have a cabal (cabal closed team where decisions and policies are made without any knowledge of and to other stakeholders.) we need to find ways to make the best of the situation. Anyways, while there are some ideas to tackle that but that s a long-term goal and I will share some aspects of it in probably another blog post. In the interim somethings can definitely be made better. Now one of the issues that is there for most people is getting the package updates. Before updating the packages, the package index needs to be updated. Now, both in home and work environments most people are cautious to update the package index. But many times, either due to bandwidth issues or some other issue which is outside your control, your package index is corrupted. I have put both the possible reasons of why and how the package index corruption takes place and a probable work-around of in the deity mail post. I do hope to put in a more coherent state by probably making smaller bug issues so they could be tackled or answered one by one. Any improvements would be better for stability of debian infrastructure only. If anybody does do the required work and need a guinea pig for testing, count me in. Just holler and share you will be working on this aspect and at least one of my workstations would definitely take part in seeing if its better or not. Even if you are able to just provide a way to make a copy of /var/lib/apt/lists after every successful update and do the comparison with time-stamp on next run and only change the copy when a successful update occurs, that will be a huge help in itself. Look forward to hearing form one and all.
Filed under: Miscellenous Tagged: #Debconf16, #feature-request, #Julian Andreas Klose, #shell-script ?, apt, aptitude

18 June 2016

Manuel A. Fernandez Montecelo: More work on aptitude

The last few months have been a bit of a crazy period of ups and downs, with a tempest of events beneath the apparent and deceivingly calm surface waters of being unemployed (still at it). The daily grind Chief activities are, of course, those related to the daily grind of job-hunting, sending applications, and preparing and attending interviews. It is demoralising when one searches for many days or weeks without seeing anything suitable for one's skills or interests, or other more general life expectations. And it takes a lot of time and effort to put one's best in the applications for positions that one is really, really, interested in. And even for the ones which are meh, for a variety of reasons (e.g. one is not very suitable for what the offer demands). After that, not being invited to interviews (or doing very badly at them) is bad, of course, but quick and not very painful. A swift, merciful end to the process. But it's all the more draining when waiting for many weeks when not a few months with the uncertainty of not knowing if one is going to be lucky enough to be summoned for an interview; harbouring some hope one has to appear enthusiastic in the interviews, after all , while trying to keep it contained lest it grows too much ; then in the interview hearing good words and some praises, and feeling the impression that one will fit in, that one did nicely and that chances are good letting the hope grow again ; start to think about life changes that the job will require to make a quick decision should the offer finally arrives ; perhaps make some choices and compromises based on the uncertain result; then wait for a week or two after the interview to know the result... ... only to end up being unsuccessful. All the effort and hopes finally get squashed with a cold, short email or automatic response, or more often than not, complete radio silence from prospective employers, as an end to a multi-month-long process. An emotional roller coaster ^[1], which happened to me several times in the last few months. All in a day's work The months of preparing and waiting for a new job often imply an impasse that puts many other things that one cares about on hold, and one makes plans that will never come to pass. All in a day's (half-year's?) work of an unemployed poor soul. But not all is bad. This period was also a busy time doing some plans about life, mid- and long-term; the usual and some really unusual! family events; visits to and from friends, old and new; attending nice little local Debian gatherings or the bigger gathering of Debian SunCamp2016, and other work for side projects or for other events that will happen soon... And amidst all that, I managed to get some work done on aptitude. Two pictures worth (less than) a thousand bugs To be precise, worth 709 bugs 488 bugs in the first graph, plus 221 in the second. In 2015-11-15 (link to the post Work on aptitude): aptitude BTS Graph, 2015-11-15

In 2016-06-18: aptitude BTS Graph, 2016-06-18

Numbers The BTS numbers for aptitude right now are:

221 (259 if counting all merged bugs independently)
1 Release Critical (but it is an artificial bug to keep it from migrating to testing)
43 (55 unmerged) with severity Important or Normal
160 (182 unmerged) with severity Minor or Wishlist
17 (21 unmerged) marked as Forwarded or Pending

Highlights Beyond graphs and stats, I am specially happy about two achievements in the last year:

To have aptitude working today, first and foremost Apart from the abandon that suffered in previous years, I mean specifically the critical step of getting it through the troubles of the last summer, with the GCC-5/C++11 transition in parallel with a transition of the Boost library (explained in more detail in Work on aptitude). Without that, possibly aptitude would not have survived until today.
Improvements to the suggestions of the resolver In the version 0.8, there were a lot of changes related with improving the order of the suggestions from the resolver, when it finds conflicts or other problems with the planned actions. Historically, but specially in the last few years, there have been many complaints about the nonsensical or dangerous suggestions from the resolver. The first solution offered by the resolver was very often regarded as highly undesirable (for example, removal of many packages), and preferable solutions like upgrades of one or only a handful of packages being offered only after many removals; and keeps only offered as last resort.

Perhaps these changes don't get a lot of attention, given that in the first case it's just to keep working (with few people realising that it could have collapsed on the spot, if left unattended), and the second can probably go unnoticed because it just works or it started to work more smoothly doesn't get as much immediate attention as it suddenly broke! . Still, I wanted to mention them, because I am quite proud of those. Thanks Even if I put a lot of work on aptitude in the last year, the results of the graph and numbers have not been solely achieved by me. Special thanks go to Axel Beckert (abe / XTaran) and the apt team, David Kalnischkies and Julian Andres Klode who, despite the claim in that page, does not mostly work python-apt anymore... but also in the main tools. They help with fixing some of the issues directly, or changing things in apt that benefit aptitude, testing changes, triaging bugs or commenting on them, patiently explaining to me why something in libapt doesn't do what I think it does, and good company in general. Not the least, for holding impromptu BTS group therapy / support meetings, for those cases when prolonged exposure to BTS activity starts to induce very bad feelings. Thanks also to people who sent their translation updates, notified about corrections, sent or tested patches, submitted bugs, or tried to help in other ways. Change logs for details. Notes [1] ^{^} It's even an example in the Cambridge Dictionaries Online website, for the entry of roller coaster:

He was on an emotional roller coaster for a while when he lost his job.

11 May 2016

Julian Andres Klode: Backing up with borg and git-annex

I recently found out that I have access to a 1 TB cloud storage drive by 1&1, so I decided to start taking off-site backups of my $HOME (well, backups at all, previously I only mirrored the latest version from my SSD to an HDD). I initially tried obnam. Obnam seems like a good tool, but is insanely slow. Unencrypted it can write about 3 MB/s, which is somewhat OK, but even then it can spend hours forgetting generations (1 generation takes probably 2 minutes, and there might be 22 of them). In encrypted mode, the speed reduces a lot, to about 500 KB/s if I recall correctly, which is just unusable. I found borg backup, a fork of attic. Borg backup achieves speeds of up to 15 MB/s which is really nice. It s also faster with scanning: I can now run my bihourly backups in about 1 min 30s (usually backs up about 30 to 80 MB mostly thanks to Chrome I suppose!). And all those speeds are with encryption turned on. Both borg and obnam use some form of chunks from which they compose files. Obnam stores each chunk in its own file, borg stores multiple chunks (even from different files) in a single pack file which is probably the main reason it is faster. So how am I backing up: My laptop has an internal SSD and an HDD. I backup every 2 hours (at 09,11,13,15,17,19,21,23,01:00 hours) using a systemd timer event, from the SSD to the HDD. The backup includes all of $HOME except for Downloads, .cache, the trash, Android SDK, and the eclipse and IntelliJ IDEA IDEs. Now the magic comes in: The backup repository on the HDD is monitored by git-annex assistant, which automatically encrypts and uploads any new files in there to my 1&1 WebDAV drive and registers them in a git repository hosted on bitbucket. All files are encrypted and checksummed using SHA256, reducing the chance of the backup being corrupted. I m not sure how the WebDAV thing will work once I want to prune things, I suspect it will then delete some pack files and repack things into new files which means it will spend more bandwidth than obnam would. I d also have to convince git-annex to actually drop anything from the WebDAV remote, but that is not really that much of a concern with 1TB storage space in the next 2 years at least I also have an external encrypted HDD which I can take backups on, it currently houses a fuller backup of $HOME that also includes Downloads, the Android SDK, and the IDEs for quicker recovery. Downloads changes a lot, and all of them can be fairly easily re-retrieved from the internet as needed, so there s not much point in painfully uploading them to a WebDAV backup site.
Filed under: Uncategorized

30 March 2016

Colin Watson: Re-signing PPAs

Julian has written about their efforts to strengthen security in APT, and shortly before that notified us that Launchpad s signatures on PPAs use weak SHA-1 digests. Unfortunately we hadn t noticed that before; GnuPG s defaults tend to result in weak digests unless carefully tweaked, which is a shame. I started on the necessary fixes for this immediately we heard of the problem, but it s taken a little while to get everything in place, and I thought I d explain why since some of the problems uncovered are interesting in their own right. Firstly, there was the relatively trivial matter of using SHA-512 digests on new signatures. This was mostly a matter of adjusting our configuration, although writing the test was a bit tricky since PyGPGME isn t as helpful as it could be. (Simpler repository implementations that call gpg from the command line should probably just add the --digest-algo SHA512 option instead of imitating this.) After getting that in place, any change to a suite in a PPA will result in it being re-signed with SHA-512, which is good as far as it goes, but we also want to re-sign PPAs that haven t been modified. Launchpad hosts more than 50000 active PPAs, though, a significant percentage of which include packages for sufficiently recent Ubuntu releases that we d want to re-sign them for this. We can t expect everyone to push new uploads, and we need to run this through at least some part of our usual publication machinery rather than just writing a hacky shell script to do the job (which would have no idea which keys to sign with, to start with); but forcing full reprocessing of all those PPAs would take a prohibitively long time, and at the moment we need to interrupt normal PPA publication to do this kind of work. I therefore had to spend some quality time working out how to make things go fast enough. The first couple of changes (1, 2) were to add options to our publisher script to let us run just the one step we need in careful mode: that is, forcibly re-run the Release file processing step even if it thinks nothing has changed, and entirely disable the other steps such as generating Packages and Sources files. Then last week I finally got around to timing things on one of our staging systems so that we could estimate how long a full run would take. It was taking a little over two seconds per archive, which meant that if we were to re-sign all published PPAs then that would take more than 33 hours! Obviously this wasn t viable; even just re-signing xenial would be prohibitively slow. The next question was where all that time was going. I thought perhaps that the actual signing might be slow for some reason, but it was taking about half a second per archive: not great, but not enough to account for most of the slowness. The main part of the delay was in fact when we committed the database transaction after processing each archive, but not in the actual PostgreSQL commit, rather in the ORM invalidate method called to prepare for a commit. Launchpad uses the excellent Storm for all of its database interactions. One property of this ORM (and possibly of others; I ll cheerfully admit to not having spent much time with other ORMs) is that it uses a WeakValueDictionary to keep track of the objects it s populated with database results. Before it commits a transaction, it iterates over all those alive objects to note that if they re used in future then information needs to be reloaded from the database first. Usually this is a very good thing: it saves us from having to think too hard about data consistency at the application layer. But in this case, one of the things we did at the start of the publisher script was:

def getPPAs(self, distribution):
    """Find private package archives for the selected distribution."""
    if (self.isCareful(self.options.careful_publishing) or
            self.options.include_non_pending):
        return distribution.getAllPPAs()
    else:
        return distribution.getPendingPublicationPPAs()
def getTargetArchives(self, distribution):
    """Find the archive(s) selected by the script's options."""
    if self.options.partner:
        return [distribution.getArchiveByComponent('partner')]
    elif self.options.ppa:
        return filter(is_ppa_public, self.getPPAs(distribution))
    elif self.options.private_ppa:
        return filter(is_ppa_private, self.getPPAs(distribution))
    elif self.options.copy_archive:
        return self.getCopyArchives(distribution)
    else:
        return [distribution.main_archive]

That innocuous-looking filter means that we do all the public/private filtering of PPAs up-front and return a list of all the PPAs we intend to operate on. This means that all those objects are alive as far as Storm is concerned and need to be considered for invalidation on every commit, and the time required for that stacks up when many thousands of objects are involved: this is essentially accidentally quadratic behaviour, because all archives are considered when committing changes to each archive in turn. Normally this isn t too bad because only a few hundred PPAs need to be processed in any given run; but if we re running in a mode where we re processing all PPAs rather than just ones that are pending publication, then suddenly this balloons to the point where it takes a couple of seconds. The fix is very simple, using an iterator instead so that we don t need to keep all the objects alive:

from itertools import ifilter
def getTargetArchives(self, distribution):
    """Find the archive(s) selected by the script's options."""
    if self.options.partner:
        return [distribution.getArchiveByComponent('partner')]
    elif self.options.ppa:
        return ifilter(is_ppa_public, self.getPPAs(distribution))
    elif self.options.private_ppa:
        return ifilter(is_ppa_private, self.getPPAs(distribution))
    elif self.options.copy_archive:
        return self.getCopyArchives(distribution)
    else:
        return [distribution.main_archive]

After that, I turned to that half a second for signing. A good chunk of that was accounted for by the signContent method taking a fingerprint rather than a key, despite the fact that we normally already had the key in hand; this caused us to have to ask GPGME to reload the key, which requires two subprocess calls. Converting this to take a key rather than a fingerprint gets the per-archive time down to about a quarter of a second on our staging system, about eight times faster than where we started. Using this, we ve now re-signed all xenial Release files in PPAs using SHA-512 digests. On production, this took about 80 minutes to iterate over around 70000 archives, of which 1761 were modified. Most of the time appears to have been spent skipping over unmodified archives; even a few hundredths of a second per archive adds up quickly there. The remaining time comes out to around 0.4 seconds per modified archive. There s certainly still room for speeding this up a bit. We wouldn t want to do this procedure every day, but it s acceptable for occasional tasks like this. I expect that we ll similarly re-sign wily, vivid, and trusty Release files soon in the same way.

17 March 2016

Julian Andres Klode: Clarifications and updates on APT + SHA1

The APT 1.2.7 release is out now. Despite of what I wrote earlier, we now print warnings for Release files signed with signatures using SHA1 as the digest algorithm. This involved extending the protocol APT uses to communicate with the methods a bit, by adding a new 104 Warning message type.

W: gpgv:/var/lib/apt/lists/apt.example.com_debian_dists_sid_InRelease: The repository is insufficiently signed by key
1234567890ABCDEF0123456789ABCDEF01234567 (weak digest)

Also note that SHA1 support is not dropped, we merely do not consider it trustworthy. This means that it feels like SHA1 support is dropped, because sources without SHA2 won t work; but the SHA1 signatures will still be used in addition to the SHA2 ones, so there s no point removing them (same for MD5Sum fields). We also fixed some small bugs!
Filed under: Debian, Ubuntu

14 March 2016

Julian Andres Klode: Dropping SHA-1 support in APT

Tomorrow ~~is the anniversary of Caesar s assassination~~ APT will see a new release, turning of support for SHA-1 checksums in Debian unstable and in Ubuntu xenial, the upcoming LTS release. While I have no knowledge of an imminent attack on our use of SHA1, Xenial (Ubuntu 16.04 LTS) will be supported for 5 years, and the landscape may change a lot in the next 5 years. As disabling the SHA1 support requires a bit of patching in our test suite, it s best to do that now rather than later when we re forced to do it. This will mean that starting tomorrow, some third party repositories may stop working, such as the one for the web browser I am writing this with. Debian Derivatives should be mostly safe for that change, if they are registered in the Consensus, as that has checks for that. This is a bit unfortunate, but we have no real choice: Technical restrictions prevent us from just showing a warning in a sensible way. There is one caveat, however: GPG signatures may still use SHA1. While I have prepared the needed code to reject SHA1-based signatures in APT, a lot of third party repositories still ship Release files signed with signatures using SHA-1 as the digest algorithms. Some repositories even still use 1024-bit DSA keys. I plan to enforce SHA2 for GPG signatures some time after the release of xenial, and definitely for Ubuntu 16.10, so around June-August (possibly during DebConf). For xenial, I plan to have a SRU (stable release update) in January to do the same (it s just adding one member to an array). This should give 3rd party providers a reasonable time frame to migrate to a new digest algorithm for their GPG config and possibly a new repository key. Summary

Tomorrow: Disabling SHA1 support for Release, Packages, Sources files
June/July/August: Disabling SHA1 support for GPG signatures (InRelease/Release.gpg) in development releases
January 2017: Disabling SHA1 support for GPG signatures in Ubuntu 16.04 LTS via a stable-release-update.

Filed under: Debian, Ubuntu

4 February 2016

Daniel Pocock: Australians stuck abroad and alleged sex crimes

Two Australians have achieved prominence (or notoriety, depending on your perspective) for the difficulty in questioning them about their knowledge of alleged sex crimes. One is Julian Assange, holed up in the embassy of Ecuador in London. He is back in the news again today thanks to a UN panel finding that the UK is effectively detaining him, unlawfully, in the Ecuadorian embassy. The effort made to discredit and pursue Assange and other disruptive technologists, such as Aaron Swartz, has an eerie resemblance to the way the Inquisition hunted witches in the middle ages and beyond. The other Australian stuck abroad is Cardinal George Pell, the most senior figure in the Catholic Church in Australia. The Royal Commission into child sex abuse by priests has heard serious allegations claiming the Cardinal knew about and covered up abuse. This would appear far more sinister than anything Mr Assange is accused of. Like Mr Assange, the Cardinal has been unable to travel to attend questioning in person. News reports suggest he is ill and can't leave Rome, although he is being accommodated in significantly more comfort than Mr Assange. If you had to choose, which would you prefer to leave your child alone with?

17 January 2016

Lunar: Reproducible builds: week 38 in Stretch cycle

What happened in the reproducible builds effort between January 10th and January 16th:

Toolchain fixes Benjamin Drung uploaded mozilla-devscripts/0.43 which sorts the file list in preferences files. Original patch by Reiner Herrmann. Lunar submitted an updated patch series to make timestamps in packages created by dpkg deterministic. To ensure that the mtimes in `data.tar` are reproducible, with the patches, `dpkg-deb` uses the `--clamp-mtime` option added in tar/1.28-1 when available. An updated package has been uploaded to the experimental repository. This removed the need for a modified debhelper as all required changes for reproducibility have been merged or are now covered by `dpkg`.

Packages fixed The following packages have become reproducible due to changes in their build dependencies: angband-doc, bible-kjv, cgoban, gnugo, pachi, wmpuzzle, wmweather, wmwork, xfaces, xnecview, xscavenger, xtrlock, virt-top. The following packages became reproducible after getting fixed:

dist/1:3.5-36.0001-1 uploaded by Manoj Srivastava, original patch by Reiner Herrmann.

grib-api/1.14.4-4 by Alastair McKinstry.

gui-apt-key/0.4-2.1 by Axel Beckert, original patch by Chris Lamb.

kiki-the-nano-bot/1.0.2+dfsg1-6 by Peter De Wachter, original patch by Chris Lamb.

ldapvi/1.7-10 by Rhonda D'Vine, original patches (#777490, #793702) by Chris Lamb and akira.

libf2c2/20090411-3 by Barak A. Pearlmutter.

liquidsoap/1.1.1-7.1 by Mattia Rizzolo.

marsshooter/0.7.6-1 uploaded by Markus Koschany, fixed upstream.

offlineimap/6.6.1+dfsg1-2 by Ilias Tsitsimpis.

pgbouncer/1.7-1 by Christoph Berg.

python-requests-toolbelt/0.5.1-4 by Petter Reinholdtsen.

shorewall-core/5.0.3.1-1 uploaded by Roberto C. Sanchez, fixed upstream, obsolete patch by Chris Lamb.

slrnface/2.1.1-7 by Rhonda D'Vine, original patches (#777425, #793722) by Chris Lamb and akira.

tetradraw/2.0.3-9 by Rhonda D'Vine, original patches (#777426, #793727) by Chris Lamb and akira.

xblast-tnt-images/20050106-3 by Rhonda D'Vine, original patch by Chris Lamb.

xblast-tnt/2.10.4-4 by Rhonda D'Vine, original patch by Chris Lamb.

xtrkcad/1:4.2.2-1 uploaded by Daniel E. Markle, fixed upstream, obsolete patch by Chris Lamb.

Some uploads fixed some reproducibility issues, but not all of them:

apt/1.2 uploaded by Julian Andres Klode, original patch by Mattia Rizzolo.

ecere-sdk/0.44.14-1 by Jerome St-Louis.

epubcheck/4.0.1-2 by Eugene Zhukov.

hplip/3.15.11+repack0-1 by Didier Raboud.

irsim/9.7.93-1 uploaded by Roland Stigge, original patch by Chris Lamb.

julia/0.4.3-1 uploaded by Peter Colberg, fix by Graham Inggs.

magics++/2.26.2-1 by Alastair McKinstry.

mailagent/1:3.1-81-1 by Manoj Srivastava.

nasm/2.11.08-1 uploaded by Anibal Monsalve Salazar, original patch by Valentin Lorentz.

ns3/3.22+dfsg-2 uploaded by Martin Quinson, original patch by Juan Picca.

shotwell/0.22.0-3 by J rg Frings-F rst.

Untested changes:

firmware-nonfree/20160110-1 by Ben Hutchings (non-free).

nvidia-graphics-drivers-legacy-304xx/304.131-3 by Andreas Beckmann (non-free).

nvidia-graphics-drivers-legacy-340xx/340.96-2 by Andreas Beckmann (non-free).

nvidia-graphics-drivers/340.96-4 by Andreas Beckmann (non-free).

reproducible.debian.net Once again, Vagrant Cascadian is providing another `armhf` build system, allowing to run 6 more `armhf` builder jobs, right there. (h01ger) Stop requiring a modified debhelper and adapt to the latest dpkg experimental version by providing a predetermined identifier for the `.buildinfo` filename. (Mattia Rizzolo, h01ger) New X.509 certificates were set up for jenkins.debian.net and reproducible.debian.net using Let's Encrypt!. Thanks to GlobalSign for providing certificates for the last year free of charge. (h01ger)

Package reviews 131 reviews have been removed, 85 added and 32 updated in the previous week. FTBFS issues filled: 29. Thanks to Chris Lamb, Mattia Rizzolo, and Niko Tyni. New issue identified: timestamps_in_manpages_added_by_golang_cobra.

Misc. Most of the minutes from the meetings held in Athens in December 2015 are now available to the public.

14 January 2016

Zlatan Todori : Happy New Year (again?)

Yes again, I like it that way. Twice! Anyway, we have here what is called Serbian New Year (it is again, Orthodox by Julian calendar). So, if you missed or think you can do better New Year's resolutions - feel free to join the party (just a notice, besides firework we have a lot of gun fire here during that time. A LOT.). Advice for everyone's resolution list: be better version of yourself, have more happy days. Cheers. (oh, yes, get more involved in Debian)

11 January 2016

Norbert Preining: 10 years TeX Live in Debian

I recently dug through my history of involvement with TeX (Live), and found out that in January there are a lot of anniversaries I should celebrate: 14 years ago I started building binaries for TeX Live, 11 years ago I proposed the packaging TeX Live for Debian, 10 years ago the TeX Live packages entered Debian. There are other things to celebrate next year (2017), namely the 10 year anniversary of the (not so new anymore) infrastructure in short tlmgr of TeX Live packaging, but this will come later. In this blog post I want to concentrate on my involvement in TeX Live and Debian. TeX Live/Debian

Those of you not interested in boring and melancholic look-back onto history can safely skip reading this one. For those a bit interested in the history of TeX in Debian, please read on. Debian releases and TeX systems The TeX system of choice has been for long years teTeX, curated by Thomas Esser. Digging through the Debian Archive and combining it with changelog entries as well as personal experiences since I joined Debian, here is a time line of TeX in Debian, all to my best knowledge.

Date	Version	Name	teTeX/TeX Live	Maintainers
1993-96	<1	?	?	Christoph Martin
6/1996	1.1	Buzz	?
12/1996	1.2	Rec	?
6/1997	1.3	Bo	teTeX 0.4
7/1998	2.0	Ham	teTeX 0.9
3/1999	2.1	Slink	teTeX 0.9.9N
8/2000	2.2	Potato	teTeX 1.0
7/2002	3.0	Woody	teTeX 1.0
6/2005	3.1	Sarge	teTeX 2.0	Atsuhito Kohda
4/2007	4.0	Etch	teTeX 3.0, TeX Live 2005	Frank K ster, NP
2/2009	5.0	Lenny	TeX Live 2007	NP
2/2011	6.0	Squeeze	TeX Live 2009
5/2013	7.0	Whezzy	TeX Live 2012
4/2015	8.0	Jessie	TeX Live 2014
???	???	Stretch	TeX Live 2015

The history of TeX in Debian is thus split more or less in 10 years teTeX, and 10 years TeX Live. While I cannot check back to the origins, my guesses are that already in the very first releases (te)TeX was included. The first release I can confirm (via the Debian archive) shipping teTeX is the release Bo (June 1997). Maintainership during the first 10 years showed some fluctuation: The first years/releases (till about 2002) were dominated by Christoph Martin with Adrian Bunk and few others, who did most packaging work on teTeX version 1. After this Atsuhito Kohda with help from Hilmar Preusse and some people brought teTeX up to version 2, and from 2004 to 2007 Frank K ster, again with help of Hilmar Preusse and some other, took over most of the work on teTeX. Other names appearing throughout the changelog are (incomplete list) Julian Gilbey, Ralf Stubner, LaMont Jones, and C.M Connelly (and many more bug-reporters and fixers). Looking at the above table I have to mention the incredible amount of work that both Atsuhito Kohda and Frank K ster have put into the teTeX packages, and many of their contributions have been carried over into the TeX Live packages. While there haven t been many releases during their maintainership, their work has inspired and supported the packaging of TeX Live to a huge extend. Start of TeX Live I got involved in TeX Live back in 2002 when I started building binaries for the alpha-linux architecture. I can t remember when I first had the idea to package TeX Live for Debian, but here is a time line from my first email to the Debian Developers mailing list concerning TeX Live, to the first accepted upload:

Date	Subject/Link	Comment
2005-01-11	binaries for different architectures in debian packages	The first question concerning packaging TeX Live, about including pre-built binaries
2005-01-25	Debian-TeXlive Proposal II	A better proposal, but still including pre-built binaries
2005-05-17	Proposal for a tex-base package	Proposal for tex-base, later tex-common, as basis for both teTeX and TeX live packages
2015-06-10	Bug#312897: ITP: texlive	ITP bug for TeX Live
2005-09-17	Re: Take over of texinfo/info packages	Taking over texinfo which was somehow orphaned started here
2005-11-28	Re: texlive-basic_2005-1_i386.changes REJECTED	My answer to the rejection by ftp-master of the first upload. This email sparked a long discussion about packaging and helped improve the naming of packages (but not really the packaging itself).
2006-01-12	Upload of TeX Live 2005-1 to Debian	The first successful upload
2006-01-22	Accepted texlive-base 2005-1 (source all)	TeX Live packages accepted into Debian/experimental

One can see from the first emails that at that time I didn t have any idea about Debian packaging and proposed to ship the binaries built within the TeX Live system on Debian. What followed was first a long discussion about whether there is any need for just another TeX system. The then maintainer Frank K ster took a clear stance in favor of including TeX Live, and after several rounds of proposals, tests, rejections and improvements, the first successful upload of TeX Live packages to Debian/experimental happened on 12 January 2006, so exactly 10 years ago. Packaging Right from the beginning I used a meta-packaging approach. That is, instead of working directly with the source packages, I wrote (Perl) scripts that generated the source packages from a set of directives. There were several reasons why I choose to introduce this extra layer:

The original format of the TeX Live packaging information (tpm) were xml files that were parsed with an XML parser (libxml). I guess (from what I have seen over the years) I was the only one ever properly parsing these .tpm files for packaging.
TeX Live packages were often reshuffled, and Debian package name changed, which would have caused a certain level of pain for the creation of original tar files and packaging.
Flexibility in creating additional packages and arbitrary dependencies

Till now I am not 100% sure whether it was the best idea, but the scripts remain in place till now, only adapted to the new packaging paradigm in TeX Live (without xml) and adding new functionality. This allows me to just kick off one script that does do all the work, including building .orig.tar.gz, source packages, binary packages. For those interested to follow the frantic activity during the first years, there is a file CHANGES.packaging which for the years from 2005 to 2011 documents very extensively the changes I made in these years. I don t want to count the hours the went into all this

Development over the years TeX Live 2005 was just another TeX system but not the preferred one in Debian Etch and beyond. But then in May 2006, Thomas Esser announced the end of development of teTeX, which cleared the path for TeX Live as main TeX system in Debian (and the world!). The next release of Debian, Lenny (1/2009), already carried only TeX Live. Unfortunately it was only TeX Live 2007 and not 2008, mostly due to me having been involved in rewriting the upstream infrastructure based on Debian package files instead of the notorious xml files. This took quite a lot of attention and time from Debian away to upstream development, but this will be discussed in a different post. Similarly, the release of TeX Live included in Debian Squeeze (released 2/2011) was only TeX Live 2009 (instead of 2010), but since then (Wheezy and Jessie) the releases of TeX Live in Debian were always the latest released ones. Current status Since about 2013 I am trying to keep a regular schedule of new TeX Live packages every month. These helps me to keep up with the changes in upstream packaging and reduces the load of packaging a new release of TeX Live. It also bring to users of unstable and testing a very up-to-date TeX system, where packages at most lack 1 month of behind the TeX Live net updates. Future As most of the readers here know, besides caring for TeX (Live) and related packages in Debian, I am also responsible for the TeX Live Manager (tlmgr) and most of upstream s infrastructure including network distribution. Thus, my (spare, outside work) time needs to be distributed between all these projects (and some others) which leaves less and less time for Debian packaging. Fortunately the packaging is in a state that making regular updates once a month is less of a burden, since most steps are automatized. What is still a bit of a struggle is adapting the binary package (src:texlive-bin) to new releases. But also this has become simpler due to less invasive changes over the years. All in all, I don t have many plans for TeX Live in Debian besides keeping the current system running as it is. Search for and advise to future maintainers and collaborators I would be more than happy if new collaborators appear, with fresh ideas and some spare time. Unfortunately, my experience over these 10 years with people showing up and proposing changes (anyone remembers the guy proposing a complete rewrite in ML or so?) is that nobody really wants to invest time and energy, but search for quick solutions. This is not something that will work with a package like TeX Live, sized of several gigabyte (the biggest in the Debian archive), and complicated inner workings. I advise everyone being interested in helping to package TeX Live for Debian, to first install normal TeX Live from TUG, get used to what actions happen during updates (format rebuilds, hyphenation patters, map file updates). One does not need to have a perfect understanding of what exactly happens down there in the guts (I didn t have in the beginning, either), but if you want to help packaging and never heard about what format dumps or map files are, then this might be a slight obstacle. Conclusion TeX Live is the only TeX system in wide use across lots of architectures and operating systems, and the only comparable system, MikTeX, is Windows specific (also there are traces of ports to Unix). Backed by all the big user groups of TeX, TeX Live will remain the prime choice for the foreseeable future, and thus also TeX Live in Debian.

10 January 2016

Juliana Louback: NLP: Viterbi Named-Entity Tagger

During my MSc program, I was lucky to squeeze into Michael Collin s NLP class. We used his Coursera course as part of the program, which I d highly recommend. Recently I decided to review my NLP studies and I believe the best way to learn or relearn a subject is to teach it. This is one in a series of 4 posts with a walk-through of the algorithms we implemented during the course. I ll provide links to my code hosted on Github. Disclaimer: Before taking this NLP course, the only thing I knew about Python was that it s the one without curly brackets . I learned Python on the go while implementing these algorithms. So if I did anything against Python code conventions or flat-out heinous, I apologize and thank you in advance for your understanding. Feel free to write and let me know.

The Concept To quote Wikipedia, Named-entity recognition (I ve always known it as tagging) is a subtask of information extraction that seeks to locate and classify elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quanitites, monetary values, percentages, etc. For example, the algorithm receives as input some text Bill Gates founded Microsoft in 1975. and outputs Bill Gates[person] founded Microsoft[organization] in 1975[date]. Off the top of my head, some useful applications are document matching (ex. a document containing Gates[person] may not be on the same topic as one containing gates[object]) and query searches. I m sure there are lots more, if you check out Collin s Coursera course he may discuss this in greater depth.

The Requirements Development data: The file ner_dev.dat provided by prof. Michael Collins has a series of sentences separated by an empty line, one word per line. Training data: The file ner_train.dat provided by prof. Michael Collins has a series of sentences separated by an empty line, one word and tag per line, speparated by a space. Word-tag count data: The file ner.counts has the format [count] [type of tag] [label] [word]. The tags used are RARE, O, I-MISC, I-PER, I-ORG, I-LOC, B-MISC, B-PER, B-ORG, B-LOC. The tag O means it s not an NE. This file is generated by count_freqs.py, a script provided by prof. Michael Collins. Run count_freqs.py on the training data ner_train.dat

The Algorithm Python code: viterbi.py Usage: python viterbi.py ner.counts ngram.counts [input_file] > [output_file] Summary: The Viterbi algorithm finds the maximum probability path for a series of observations, based on emission and transition probabilities. In a Markov Process, emission is the probability of an output given a state and transition is the probability of transitioning to the state given the previous states. In our case, the emission parameter e(x y) = the probability of the word being x given you attributed tag y. If your training data had 100 counts of person tags, one of which is the word London (I know a guy who named his kid London), e( London person ) = 0.01. Now with 50 counts of location tags, 5 of which are London , e( London location ) = 0.1 which clearly trumps 0.01. The transition parameter q(y_i y_i-1, y_i-2) = the probability of putting tag y in position i given it s two previous tags. This is calculated by Count(trigram)/Count(bigram). For each word in the development data, he Viterbi algorithm will associate a score for a word-tag combo based on the emission and transition parameters it obtained from the training data. It does this for every possible tag and sees which is more likely. Clearly this won t be 100% correct as natural language is unpredictable, but you should get pretty high accuracy. Optional Preprocessing Re-label words in training data with frequency < 5 as RARE - This isn t required, but useful. Re-run count_freqs.py if used. Python code: label_rare.py Usage: python label_rare.py [input_file] Pseudocode:

Uses Python Counter to obtain word counts in [input_file]; removes all word-count pairs with count < 5, store remaining pairs in a dictionary named rare_words.

Iterates through each line in [input file], checks if word is in rare words dictionary, if so, replaces word with RARE.

Step 1. Get Count(y) and Count(x~y) Python code: emission_counts.py Pseudocode:

Iterate through each line in ner.counts file and store each word-label-count combo in a dictionary count_xy and update the dictionary of count_y. For example count_xy[Peter][I-PER] returns the number of times the word Peter was labeled I-PER in the training data and count_y[I-PER] the total number of I-PER tags. The dictionary count_y contains 8 items, one for each label ( RARE , O, I-MISC, I-PER, I-ORG, I-LOC, B-MISC, B-PER, B-ORG, B-LOC);

Return count_xy, count_y

Step 2. Get bigram and trigram counts Python code: transition_counts.py Pseudocode:

Iterate through each line in the n-gram_counts file

If the line contains 2-GRAM add an item to the bigram_counts dictionary using the bigram (two space-separated labels following the tag type 2-gram ) as key, count as value. This dictionary will contain Count(y_i-2,y_i-1).

If the line contains 3-GRAM , add an item to the trigram_counts dictionary using the trigram as key, count as value. This dictionary will contain Count(y_i-2, y_i-1, y_i).

Return dictionaries of bigram and trigram counts.

Step 3. Viterbi (For each line in the [input_file]):

If the word was seen in training data (present in the count_xy dictionary), for each of the possible labels for the word:

Calculate emission = count_xy[word][label] / float(count_y[label]

Calculate transition = trigram_counts[trigram])/float(bigram_counts[bigram] Note: y_i-2 = , y_i-1 = for the first round

Set probability = emission x transition

Update max(probability) and arg max if needed. 2 If the word was not seen in the training data:

Calculate emission = count xy[RARE][label] / float(count y[label].

Calculate q(y_i y_i-1, y_i-2) = trigram counts[trigram])/float(bigram counts[bigram]. Note: y_i-2 = , y_i-1 = for the first round

Set probability = emission transition

Update max(probability) if needed, arg max = RARE

Write arg max and log(max(probability)) to output file.

Update y_i-2, y_i-1.

Update y_i-2, y_i-1.

Evaluation Prof. Michael Collins provided an evaluation script eval_ne_tagger.py to verify the output of your Viterbi implementation. Usage: python eval_ne_tagger.py ner_dev.key [output_file]

30 December 2015

Julian Andres Klode: APT 1.1.8 to 1.1.10 going faster

Not only do I keep incrementing version numbers faster than ever before, APT also keeps getting faster. But not only that, it also has some bugs fixed and the cache is now checked with a hash when opening. Important fix for 1.1.6 regression Since APT 1.1.6, APT uses the configured xz compression level. Unfortunately, the default was set to 9, which requires 674 MiB of RAM, compared to the 94 MiB required at level 6. This caused the test suite to fail on the Ubuntu autopkgtest servers, but I thought it was just some temporary hickup on their part, and so did not look into it for the 1.1.7, 1.1.8, and 1.1.9 releases. When the Ubuntu servers finally failed with 1.1.9 again (they only started building again on Monday it seems), I noticed something was wrong. Enter git bisect. I created a script that compiles the APT source code and runs a test with ulimit for virtual and resident memory set to 512 (that worked in 1.1.5), and let it ran, and thus found out the reason mentioned above. The solution: APT now defaults to level 6. New Features APT 1.1.8 introduces /usr/lib/apt/apt-helper cat-file which can be used to read files compressed by any compressor understood by APT. It is used in the recent apt-file experimental release, and serves to prepare us for a future in which files on the disk might be compressed with a different compressor (such as LZ4 for Contents files, this will improve rred speed on them by factor 7). David added a feature that enables servers to advertise that they do not want APT to download and use some Architecture: all contents when they include all in their list of architectures. This is to allow archives to drop Architecture: all packages from the architecture-specific content files, to avoid redundant data and (thus) improve the performance of apt-file. Buffered writes APT 1.1.9 introduces buffered writing for rred, reducing the runtime by about 50% on a slowish SSD, and maybe more on HDDs. The 1.1.9 release is a bit buggy and might mess up things when a write syscall is interrupted, this is fixed in 1.1.10. Cache generation improvements APT 1.1.9 and APT 1.1.10 improve the cache generation algorithms in several ways: Switching a lookup table from std::map to std::unordered_map, providing an inline isspace_ascii() function, and inlining the tolower_ascii() function which are tiny functions that are called a lot. APT 1.1.10 also switches the cache s hash function to the DJB hash function and increases the default hash table sizes to the smallest prime larger than 15000, namely 15013. This reduces the average bucket size from 6.5 to 4.5. We might increase this further in the future. Checksum for the cache, but no more syncs Prior to APT 1.1.10 writing the cache was a multi-part process:

Write the the cache to a temporary file with the dirty bit set to true
Call fsync() to sync the cache
Write a new header with the dirty bit set to false
Call fsync() to sync the new header
(Rename the temporary file to the target name)

The last step was obviously not needed, as we could easily live with an intact cache that has its dirty field set to false, as we can just rebuild it. But what matters more is step 2. Synchronizing the entire 40 or 50 MB takes some time. On my HDD system, it consumed 56% of the entire cache generation time, and on my SSD system, it consumed 25% of the time. APT 1.1.10 does not sync the cache at all. It now embeds a hashsum (adler32 for performance reasons) in the cache. This helps ensure that no matter what parts of the cache are written in case of some failure somewhere, we can still detect a failure with reasonable confidence (and even more errors than before). This means that cache generation is now much faster for a lot of people. On the bad side, commands like apt-cache show that previously took maybe 10 ms to execute can now take about 80 ms. Please report back on your performance experience with 1.1.10 release, I m very interested to see if that works reasonably for other people. And if you have any other idea how to solve the issue, I d be interested to hear them (all data needs to be written before the header with dirty=0 is written, but we don t want to sync the data). Future work We seem to have a lot of temporary (?) std::string objects during the cache generation, accounting for about 10% of the run time. I m thinking of introducing a string_view class similar to the one proposed for C++17 and make use of that. I also thought about calling posix_fadvise() before starting to parse files, but the cache generation process does not seem to spend a lot of its time in system calls (even with all caches dropped before the run), so I don t think this will improve things. If anyone has some other suggestions or patches for performance stuff, let me know.
Filed under: Debian, Ubuntu

Next.

Previous.